bOCUMENT RESUME ** 



ED 227 165 

AUTHQR 
TITLE 

PUB DATE 
NOTE • 



PUB TYPE 



EDRS PRICE < 
DESCRIPTORS 



TM 830 187 



IDENTIFIERS 



ABSTRACT 



Koffl^r, Stephen L. . 

A Longitudinal Analysis of Curricular Validity for a< 

Minimum Competency Testing Program. 

Apr 83 ' 

26p.; Paper presented at the Annual ^Meeting of the 

American Educational Research .Associat ion (67th, 

Montreal, Quebec, Canada, April 11-15, 1983). Tables 

1 and 2 contain small print. 

Speeches/Conference. Papers (150) — Reports - 
Research/Technical (143) 

MF01/PC02 Plus Postage. 

Basic Skills; Court Litigation;- *Curriculum; 
Elementary Secondary Education; Longitudinal Studies; 
♦Mathematics Instruction; *Minimum Competency 
Testing; Psychometr ics-; *Reading Instruction; Testing 
Problems; Testing Programs; *Test Validity 
Content Validity; *Curricular Validity; Debra P v 
Turlington; Modified Caution Index; *New Jersey 
Minimum Basic Skills Program 



This study examined the curricular validity of the 
New Jersey Basic Skills test, a minimum competency test administered 
to all public school students in grades 3, 6, 9, an.d 11 to measure 
basic skills in reading and mathematics. Based* on examinations of a 
Modified Caution Index, there were differences in the usual response 
patterns for both reading and mathematics. This result suggests that 
within' districts there may be differences i'n the content coverage and 
emphasis placed' on some of the' subsets of, items contained on a 
minimum competency test. Because differences were noted across 
districts and curricular programs, there'is the suggestion that there 
may be problems with using one test. Other, more detailed 
non-test-based analyses should be conducted to further examine the 
curricular validity and also "the instructional validity of this test. 
The present analyses provide an initial insight into possible ■ 
differences between test and curricular matches. (Author/PN) 



J 




*********************************************************** 
*. Reproductions supplied by EDRS are the best , that can be made * 

from the original document. " * 

********************************* ************************************** 



U S DEPARTMENT OF EDUCATION 
NATIONAL INSTITUTE Of EOUCATION 

E-DUCATIONAI RESOURCES INFORMATION 
CENTER !£RIC» , ^ 

Th^ (jOvurr^nt h<i«, been ^produced 

UfUJlOdtlfUj n 

t M no) i_^jn t j«>s hjvf bc«n (n<jrie to impmw 

• Pti»f ?•> of will* opinions ->r<jtt*{j if) th>s (hy I, 
'Tit>t>* dt, n<» r.«H eerily f »*(>r«'s**f it otf i il Nlf 



'V. 



fl Longitudinal Analysis of , Curricular Validity 
For fl Mini mum Competency Testing Prograto 



* ! Stephen L. Koffler 

New Jersey State Department of Education 



"PERMISSION TO REPRODUCE THIS 
MATEFUAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)" 



Paper presented at the Annual Meeting of the 
American Educational Research Association 
■ Montreal, Canada, April 11-15, 1-983 



9 



\ t 



A Longitudinal Analysis of Curricular Validity 

For A Minimum Competency Testing Program -V 

\ 

Stephen L. Koffler > 
iNew Jersey State Department of Education 



'iNtSQDUCIION . 

*j Until recently a narrow definition of content validity has 
been used *when considering achievement tests. If the items rnea- 
sured« the test's ojaject^ ves, then the test was considered to be 
content valid for all exanrinees regardless of their background, 
school attended, or' instructional program. However, because of 
court decisions related to minimum competency testing and the use 
of such tests for high, school' .graduat ion, the definition of 



con- 



tent validity now has been broadened to include consideration of 

y\1J „ , .1,2*. 

both curr/icular and* instructional validity. 

The issues of curricUlar and instructional validity surfaced 
in the Debra P. v. Tur 1 i/igt on case. , In that case, the^plaint i f f s 
challenged Florida's 1976 high schodl graduation requirement law 
which mandated that students had to pass the Florida Functional 
literacy Test <a test developed by the Department of Education), 
and satisfy other requirements, to receive a high school diploma. 

1 ■ . * 

Curricular validity refers to the match between the skills 
tested- and those in the curriculum; instructional validity refers 
to the match between the skills tested and those taught. 

2 ' . 

, As Madaus<1983) indicates, the definition of content validity 
.historical ly has included the concepts of curricular and instruc- 
tionaj validity. In practice, however, the narrow definition of 
Content validity was used. For a detailed treatise on the issues 

* of curricular and instructional validity read The Courts^ 
VMiditXi and Minimum Competency Jesting, edited"" b y~ G e or g e F. 
Madaus, Boston: Kl uwer-Ni jtiof f Publishing, 1983. 



fl key issue in the Debra P. litigation was whether a test 
used as a graduation requirement " ... ^hould only measure that 
which the schooling has offered the students. 11 (Pul 1 in, 1983). 
Many of the arguments pertaining to this issue centered on the 
definition of dontent validity. The defendants argued for the 
narrow definition of content validity, i.e., the match between 
items and skills. The plaintiffs argued that content validity had 
to include curricular and instructional validity. 



ERIC 



T#€T appeals court agreed with the plaintiffs and ruled that 
> in determining the content validity of the Florida Functional 
Literacy Test, the Florida Department of Education must address 
the issue of whether the test covered the material- taught. 
Madaus(1983) accurately summarized the situation: "The court's 
decision to broaden the meaning of content validity to include 
evidence fhat pupils had been taught the materials on a certifi- 
cation tett adds an important new dimensipn to the validation 
process. If the test is to-be used as a graduation requirement, 
then the court is;asking the state for evidence that the test is 
.measuring things that pupils had fair opportunity to learn." 

Clearly, Minimum Competency Tests <MCT) which have been de- 
veloped witri care, based upon rigorous professional standards, 
should have content validity in the narrow sense. However, the 
broader question is whether the MCTs, especially those used for 
graduation decisions, have curricular and instructional validity. 

In the high schools, there are four types of curricular 
programs — college preparatory/academic, general, business/ 
commercial and vocat ional/industrial^ arts. Th^ scope of these 



''2 



programs' curricular of fermgs'dif fers within and among schools. 
This. raises an important question — can a single state-develope 
Minimum Competency Test be curricular valid across all programs 
and all high schools? In broader terms, can state-developed tes\ 
be used fairly as a requirement for high school graduation. 

This study 'focused on curricular validity. Its purpose was 
to examine the curricular validity of the New Jersey statewide 
minimum competency test, considering the different high school 
programs. The study also examined the change in the curricular 
validity during the five years'of the program's existence 

MEASURING CURRICULAR VALIDITY 

There are many methods to analyze the curricular validity o 

i 

a test. Poptiam and Lindheirn < 1981 > identified two methods. The 
first is based on an analysis of the instruct ional materials, in- 
cluding textbooks, course syllabi and teachers' lesson plans. The 
second involves an analysis of the interactions in the classroom. 
Schmidt, et.al. (1983) developed a taxonomy which enabled them to 
measure the content of instruction, tests and curricular mater- 
ials. The taxonomy maps the test's items into its content specif- 
ications and permits one to determine the degree to which" the 
test item taxonomy map is subsumed under the specification rnap.' 
Leinhardt (1983) suggested procedures based on an analysis of the 
match between scope and sequence charts and test, descriptions of 
content covered, an analysis of texts by either item or computer 
search, and an analysis of instruction by teacher observation. 

fill of these procedures as well as similar ones suggested by 
others are difficult to apply. They rely on the collection of- 



considerable data from- a broad range of individuals. There are 
other procedures to assess curricular validity based on tests and 
test results' from which data are more readily obtained. ^ * 

Harnisch & Linn (1981) provide a comparison of techniques 
which can be used to identify unusual response patterns' on test 
items. They said that an analysis of the response patterns can be 
used to discover, relat ionships between specific? tests and the 
curricula. They also suggested 1 that differences in performance, 
on items measuring certain skills could indicate weaknesses in 
the, teaching of the skills in different districts. " 

According to Haney<1983)^ using tests to examine curricular 
validity has two disadvantages. First, the methods rely on the 
test data. If the va^dity of the test is questionable, then the 
use of the test results is limited. Second, the procedures are 
applicable only for groups of students, not individuals; however 
the real concern is for the individual. 

Haney' s<1983> limitations of the test-based procedures can 
be overcome, fls previously indicated, content validity in the 

narrow sense should be assured because of the procedures and care 

* 

used to develop the test. The issue'of group v. individual analy- 
sis would be a more serious concern twere one considering instruc- 
tional validity. For the analysis of curricular validity, which 
i3'a prerequisite for an examination of instruct ional validity, 
an examination of group results will suffice. Such an analysis 
could provide information regarding 'differences in exposure to 
different subject matter and the manner in which .that subject 
matter has been taught. (Harnisch & Linn, 1981). Thus, thp most 



practical method for an initial examination of ' curr icular valid- 
ity is based on the test result s» < 

BQ£!<GRQUND/DQTB SOURCE * . ' 

The New Jersey Minimum Basic Skills Tests<MBS) have been 
administered annually since 'spring 1978 to all public school 
students in grades 3, 6, 9 and 11. These tests measure reading 
and mathematics minimum basic skills which people in N New Jersey 
determined were the skills students must master, at a minimum, by 
spring of the tested grades. % In 1979 a state law was passed which 
established uniform statewide high school graduation require- 
ments. Beginning with the ninth grade class in 1981-1982, stu- 
dents have to meet certain curricuTar and attendance requirements 
and also have to pass the ninth grade statewide test to obtain a 
high school diploma, 

*i 

Each test contains approximately 100 four-option multiple 

* * • ** 

choice items and takes 90 minutes to complete. All items are 
rewritten each year although the skills upon which the tests are 
based remain the same. An equating procedure assures the equiva- 
lence of Scores across each year's forms and a unifoVm score 

f m 

scale <0-100> makes consistent the reporting of the results. 
» 

Finally, in addition to report ing .total test scores, scores are 
reported for three reading subskill 'clusters' (word recognition, 
reading comprehension and study skills) and four mathematics sub- 
*ski 11 ' clusters' (computat ion, number concepts, measurement * 
geometry, and problem solving & appl icat ions) - 

For the present research five school districts, representing 
each of the five major types of school districts in New Jersey 



4 



(urban, suburban, rural, regional and vocational), were randomly 
selected?. Ninth grade students 1 results in those districts were * 
used, because of that grade's relationship to the graduation-law. 
Data were obtained for the ninth grade students in the first 

(1978), third < 198(3) and fifth (1982) year of the MBS program. 

^ The final data element collected was the students' high 
school program. Each year the ninth grade students were Risked a 
series of background/contextual questions. One such question 
asked: "Which of the following b<*st describes your present high 
school program?" The possible response** were limited to — busi- 
ness/commercial,^ college preparatory/academic, vocat ional/indust- 
rial arts, and general. This information and the students' test 
results were used 'to examine *the curricular validity of the MBS. 

METHODOLOGY 

Harnisch & Linn (1981) Compared eight different indices 
designed t,o determine whether* an individual's pattern of respon- 
ses on an achievement test was unusual. / 

Jtems which are generally difficult for most students may be 
relatively easy for students who have been in classes where 
that particular content was emphasized. Such variation from 
tf>e norm may lead to the systematic ovei — or undei — estima- 
' tion of an individual's or group's level of achievement, 
d jstort ing the measurement results. 

These indices could be used to identify individuals for whom 
the standard interpretation of the test score is misleading, 
or identify groups with atypical instructional and/or exper- 
iential histories that alter the relativd difficulty or- 
dering of the items. In addition, the items that contribute 
most to (?igh values on an index for particular subgroups 
could be identified 5nd judgments made regarding the approp- 
riateness of the, item content for those subgroups. (Harnisch 
x & Linn, 1981). 

"6 



A Modified Caution Index (C' ') will be used .for this st.udy. 



9 

ERIC 



■ Harnisch & Linn < 1981 > ' concluded that C was the best index to 
■\ . i 

use to examine unusual response patterns because it was *-he least 

correlated to total test score of the eight indices they compared. 

( " 

DESCRIPTION OF CAUTION INDICES v 

Sato developed a matrix* called the Student - Probl em (S-P) 
Table to define an index of the degree to' which an individual's 
response pattern is unusual. (See Tatsuoka 1978). Each row of the 
matrix represents an examinee while each C&lumn represents an 
item, ^ell entries are either ones forvcorrect responses or zero^ 
for incorrect responses. The columns of the matrix are arranged 
from left to right in ascending order of it em , difficulty ; 'the 
rows are arranged from top to bottom in descending order of tfctal 
nurrvber of correct answers. 

If the items on" a test formed a perfect Guttman Sc^le 
(Suttrnan, 1941) the S-P .Table would consist of all ones in the^ 

« 

upper left corner and all zeros in the lower right corner. Anyone 
who responded correctly ^to a difficult item would have answered 
all easier items correctly. There would be no unusual response^ 
patterns because everyone with a given total score would have the 
same response pattern. However, because perfect Guttman Scales 
are unlikely qn achievement testes, a typical S-P t Table will be 
characterized by mostly (but not" all) onek in the upper left 
corner and mostly (but not all) zeros in the lower right corner. 

Sato(1975) developed an index based on the S-P Table called 

i 

the Caution Index (C ). C provides information about an examinee 

i i s 
which is not contained in the total score. Examinees with large 



9 



value's for C have unusual respor.se patterns. Harnisch & 
Linn<1981> suggest that "unusual response pattern's may result- 
from guessing, carelessness, high anxiety, an unusual instruc- 
tional history or other experiential background, a localized 
misunderstanding that influences responses to a subset of items, 
or- copying a neighbor's answers to certain questions." Thus, 

.those students' test- score should be ' interpreted with caution 

th 

Sato's Caution Index for the i , examinee is as follows: 



C = 
i 



i. J 
£ ( 1 - u ) n E u n 



n 

Er* n 



n 



i. 



,u ■■ - n ( Er* n /J) 
3 • J i. J ■ j 



"where 



s 



i = 1, £, . I indexes each of the I examinees; 

J = 1, S, . J indexes each of the J items; 

1 if examinee i answers i^em j correctly, 

^ 0 of examinee i answers item j incorrectly, 

th 

n - number correct for the i examinee, 

_ • th 

n - number of correct responses' to the j > item. 



The problem with C is tHat large values may occur, ' espec- 

i \ 

ially in cases where a very higYfsco^ing examinee misses one easy 

item. Harnisch & Linn(1981> developed a modified version of C 

* \ * i 

(called C ) which has a lower bdkmd of 0 and an upper bound of 

i \ 
1. Establishing the bounds about t^ie index eliminates extreme 

l 

scores which may be obtained on. C ^ 



8 



10 



th 

The Modified Caution Index for the 1 examinee is: 
* . n., J 

C 



= £ ( 1 - u ) n - £ i u ri 



n.^ J 
£ * r, - Z n 
j«l . j 



<2) 



»For the present study, a Modified Caution Index was computed 
for each individual using computer programs- written by the author 
in the FORTRAN IV programming language* fill statistical analyses 
were peformed us'ing the Statistical Analysis System (SflS) . fin IBM" 
«37©AS8 was used. 

RESULTS . 

Tables 1 (Reading) and 2 (Mathernat ics) illustrate the mean 

* 

Modified Caution Index (C ) for students in each curricular prog- 
ram within each district for each of the three years. The first 
observation evident from the tables is that the mean indices for 
reading were larger than^those for mathematics. Th%s, there C\s a 
higher degree of unusual responses for the reading test than for 
the mathematics tfest. This result may be related" to the greater 
complexity in teaching reading, especially reading comprehension, 
as compared to mathematics computation. 

* 

To examine the differences among the C for- each situation, 
the students' readying and mathematics indices were used as depen- 
dent variables in partial h ierarch ical^ analyses of variance.' The 
year tested and /the students' district were crossed factors; the 
students' curricular program was nested within districts. Tables . 
3<Reading) and 4 (Mathematics) present.-the results. ^ 



TABLE 1 , 

MEAN MODIFIED CAUTION INDICES 
FOR THE MBS READING TEST 



District 




1978 


1980 ' ' 


1982 


Business 


Academic 


Vocational 


General 


Total 


Business 


Academic 


Vocational 


General 


Total 


Business 


Academic 


Vocational 


General 


Total 


Vocational 
A 


.343 
(5) 


.242 
(11) 


.314 
(180) 


.307 
(31) 


.310 
(227} 


.317 


\326 
(3) 


' .304 
(186) 


.306 
(17) 


.304 
(213) 


.299 
(12) 


.338 
(11) 


.322 
(188) 


.302 
(15) 


•<?20 
(226) 


B 


.305 
(21) 


• 341 
(113) 


.290 
(7) 


.291 
(94) 


.316 
(235) 


> .317 
(13) 


.356 
(114) 


.360 
(9) ^ 


.324 
(101) 


.340 
(237) 


.291 


.332 
(88) 


.302 
' (7) 


.344 
(67) 


.334 
(169) 


Suburban 

—A — 


.303 
(207) 


, .335 
(226) 


.310 
(63) 


. .300 
(107) 


.315 
(603) 


.314 
(172) 


.321 
(203) 


.317 
(81) 


.306 
(132) 


.329 ' 
(588) 


• .317 
(124) 


.347 
(199) 


.341 
(41) 


.305 

dio) 


.329 
(474) 


^ — Stetfional 
D 


i 

.343 
(10) 


.311 

(254) 


.315 
(4)' 


.339 
(65) 


.355 
(333) 


.308 
(8) 


s 

.381 
(176) 


t 

.363 
(7) 


.353 
(54) 


.372 
C245) 


'.337 
(5) 


-373 
(152) 


.431 
.(4) 


.350 
(77) 


.366 
(238) 


Urban 
E 


.294* 
(63) 


.304 
(220) i 


.313. 
i (38) 


.302 
(128) 


.303 
(449) 


• 310 
(88) 


.317 
(201) 


.318 
(44) 


/.318 
(131) 


.316 
(464) 


.287 
(75) 


.329 
(178) 


.290 
(39) 


.295 . 
(121) 


.308 
(413) 




.304 
(306) 


.334 
(824) 


.112 
(292) 


.305 
(425) 


.319 
(1847) 


.313 
(288) 


.341 
(697) 


.312 
(327) 


.320 
(435) 


.325 
(1747) 


.306 
(223) 


.346 
(628) 


i 

.321 
(279) 


.318 
(390) 


.328 
(1520) 



ERIC 



12 



13 



X 



TABLE 2 > 

MEAN MODIFIED CAUTION INDICES 
FOR THE MBS MATHEMATICS TEST 







1978 


1980 






1982 




/ 




District 


Business 


Academic 


Vocational 


General 


Total 


Business 


Academic 


Vocational 


General 


Total 


Business 


Academic 


Vocational 


General 


Total 


Vocational 
A 


.223 
(5) s 


.258 
(11) 


.235 
(180) * 


• .237 
(31) 


.236 
(277) 


.252 
(7) 


.219 
(3) 


.264 
(186) • 


.265 
(17) 


.263 
(213) 


.225 
(12) 


J315 
(11) 


.276 
(188) 


.268 
(15) 


.275 
(226) 


Rural , 
B 


.243 
(21) 


.301 
(113) 


.255 
(7) 


K 

• 262 
(94) 


.279 
(235) 


.263 
(13) 


'» .308 
(114), 


.278 
(9) 


.281 
(101) 


. - 
.293 
'(2*7) 


s 

.348 
(7) • 


.366 
<$8) 


.286 
(7) 


.343 
(67) 


.353 
(169) 


Suburban 
C 


.222 
• (207) 


.264, 
(226) 


r 

.239 ' 
(63) 


.240 
(107) 


.243 
(603) 


.250 
(172) 


.271 
(203) 


• 

.262 
(81) 


*254 
(132) 


.260 
(588) 


.277 h 
(12'4)' 


.316 
(199) 1 


.298 , 
(41) . 


.259 
(110) 


.291 ' 
(474) 


Regional 
D 


.278 
(10) 


.308 
(254) 


V .305 * 
(4) 


.302 
(65) 


.306 
(333) 


.310 
(8) 


.321 
(176) 


.346 
(7) 


.287 
(54) 


i 

.314 
(245)' 


.254 
(5) 


.356 
(152) 


.304 
(4) 


.327 
(77) 


.343 
(238) 


Urban 
E 


*.243 
(63) 


.235 
(220) 


.250 
(38) m 


.243 
(128)" 


.240* 
(*A9> 


.272 
(88). 


.264 
(201) 


.268^ 
(44) 


.242 
(131) 


-\260 
(464) 


1261 
(75) 


.321 
(178) 


* .251 
(39) 


.272 
(121) 


.289 
(413) 


Total 


.230 
(306) 


.275 
(824)* 


• 239* 
(292) ' 


.255 
(425) 


.257 
(1847) 


1 — 

.259 
(288) 


.288 * 
(697) 


.266 
(327) 


.261 
(435) 


.272 
(1747) 


.271 
(223) 


.334 
(628) 


.276 
(279) 


.291 
(390) 


.3(fe 
(1520) 




Table 3 

Summary of the Analysis of Variance 
For the Reading Modified Caution Index 







Sum of 


Mean 




Effect 


P.F. 


Squares 

i 


Square » 


F 


.District 


4 


0.291 


0. 073 


1. 98 


Program (District) 


15 


0,552 


0. 037 


3. 08* 


Year 


2 


0. 051 


0. 026 


2. 15 


District*Year 


a 


, 0.116 


0. 015 


1. 22 


Prog(Dist) *Year 


• 30 


0. 355 


0. 012 


1. 04 


Within Cell 


5054 


57. 392 


. 0.011 




Total 


5113 - 


60. 285 







* P < . <21 



Table 4 

Summary of the Analysis of Variance 
For the Mathematics Modified Caution Index 



m 




Sum of 


Effect 


*D.F. 


Squares 


Distf ict 


4 


0. 452 


Program (Di str i ct ) 


15 


' 0. 830 


Year 


2 


0.288 


District*Year 


8 


0.076 


Prog (Dist)*Year 


30' 


0.512 


Within Cell 


5054 


59. 648 


Total 


5113 


65.916 



Wean 
Square 



0. 113 
0. 055 
0. 144 

0. 010 
0.017 

0*012 



2. 05 
3.24* 
8. 44* 

0. 56 

1. 44 



* P < .01 

\ 

There was no significant year effect for the reading test. 

• » 

However, there was" such a significant effect for the mathematics 

<• * * U 
test (p < .01). Scheffe's multiple comparison test showed that the 
'- * 

mean C for the students tested in 1978 Of = 0.257) was signifi- 
i 

cantly smaller than that for 1980 (F = 0.272) which was signi- 
ficantly smaller than the 1982 result <X" * 0. 303) . N 

Bath 'year' results are fairly curious ones* P. larger C 



12 



1 



6 



! 



0 

ERIC 



* is associated with a more unusual response pattern, due in part 

perhaps to lack of currVcular validity. One might reasonably 

* 

expect that 'the C 1 s should decrease over time (i.e. as the 
skills are included in the curriculum) rather than either remain 
the same (reading) or increase (mathematics). 

; 

A possible explanation for these results is that since both 
tests' mean scores increased from 1978 to ^982, "the increase was 
due to a better mastery of those skills included in the curric- 
ulum, but not W skills not in the curriculum. Thus, students 
were scoring higher^ in 1980 than they did in 1978 and higher in 
1982 than they did in 1980. If higher scoring students missed 
easy^ items (which were not in their curriculum), their value of 

C would be greater than that for students with lower scores who 

1 i 

missed the same items. This interpret at iov> assumes that the cm — 

riculum did not change over time to reflect the tested material. 

The significant curricular program effect (p < .01) for both 

reading and mathematics indicates that summed over the three 

years, «there were significant differences in C for the various 

i 

curricular programs within each district. This significant effect 

can be, further analyzed using Scheffe comparisons. However that 

result would only identify the curricular program(s) which had 

significantly larger values of C than others. For purposes? of 

i 

examining 'curricular validity, it is more important to assume 
that the differences exist and to analyse the cause of the un- 
usual response patterns, especially since the .meanS^were largel 
A 'second series of analyses was conducted to identify the 
subsets of items which contributed most to-the Modified Caution 



Indices for each curricular programs and district for each year. 
Following the procedures of Harnisch & Linn(198l), the test re- 
sults were evaluated using linear regression analyses. The pro- 
portion of students who correctly answered each item <p-value) ? 
was computed for each of the 110 reading and 35 mathematics items 
for each appropriate unit. Mean test . performance is directly 
related to item p-values; thus, the regression analyses were 
performed on the p-values for each appropriate unit with the p- 
values from the state results for each year. 

The expected item p-values for each unit were determined 
firom the regression equation and a residual was computed for each 
item- Then items were categorized according to their content. The 
reading test was divided into its' three clusters — word recog- 
nition, reading comprehension and study skills; the mathematics, 
test into its four — computation, numbe^ concepts, measurement & 
geometry, and problem solving. Finer groupings of the items into 
the subskills which compose the flusters were not meaningful be- 
cause each subskill is assessed by a very "small number of items. 

j The mean residual for each cluster was computed and standar- 
dized by dividing it by the standard error of estimate. Those 
standardized mean residuals were multiplied by the square root of 
the number of .items in the cluster. That resulted in weighted 
standardized mean residuals which^, as Harnisch & Linn<l£fll>, note 
are analogous to, critical ratios. The weighted standardized mean ' 
residuals were used to compare the items in each clyster. 

The first regression was performed on the p-values for each 
district. Table 5 reports those results for each 1 district in each 

14 18 



of the three years. Values greater than 2.8 'indicate that items 
in that cluster were much easier for the students in that school 
than would be expected from their overall performance and the 
relative difficulty of those items for the population of students 
in the particular year. A value less than -2.0 indicates' that the 
items were much harder. Seven of the entries (6/754) had weighted 
standardized mean residuals greater than 2.0 while 9(8.6'/) had 
values less than -2.0. 



TABLE 5 

Weighted Standardized Mean Residuals Of District 
Item P-Values By Content Category For Each Year 











Content Category. 




<* 








Readi ng 






Mathemat ics 




District 
Year 


1 Word 
1 Rec. 


Read 
Comp. 


Study 1 
Skillsl 


Compu- 
t at ion 


Number 
Cone. 


Meas. 
•Geom. 


Prob ' 
Solve 


A 

• 


1978 
f980 
1982 


1.86 
0.45 
1.58 


-2. 04 
-0. 33 ' 
-0. 73 


-0. 11 
0.06 

-0.69 


1.24 
2. 54 
-0. 38 


-0.28 
0.43 
2.80 


-1. 37 
-3. 60 
-0. 88 


-0. 13 
-0. 35 
-1. 13 


B 


-/1978" 
1980 
1982 


-0.07 

-0.88 
"0.77 


0. 10 

1. 17 
-0.43 


0.21 
-1.14 " 
-0. 19 


-0.06 

-0. 76 
-1. 15 


-0.73 
0. 18 
0. 73 


1.04 
-0.28 
1. 36 . 


-0. 35 
1. 32 
-0. 35 


- C 


1978 
1980 • 
1982 • 


-3. 13 
-3.55 
-2.01 


1.33 

0. 80 

1.22 


1.62 
3.27 
0.32 


0.-09 
0. 27 
-1.88 


-0.85 
-0.73 
■ 1.53 


-1.81 
-1.79 
1. IS 


2. 70 
2. 27 
0. 22 


D 


1978 
1980 
1982 


-0. 53 
0.52 
-0.84 


1. 04 

0. 66 

1. 13 


-1.63 
-2.04 
-1. 13 


-1.49 
-3. 44 
-1.81 ' 


2. 12 
0. 56 
0.92 


1.03 
2. 53 
1.56 


-0. 89 
1.99 
0.25 


E 


1978 
1980 
1982 


1.55 
1.60 

-0. 02 


-0. 40 

-1J78 
-0,39 


-1.23 
1.36 
0. 80 


0.07 

0.59 
1.26 


0;03 
-0.07 
1. 15 


-X . 61 
-2. 11 
-3. 15 


1.66 
" 1.51 
0.25 



ERIC l 



15 



10 



The interest lies with the large > negat ive values. The most ^ 
striking results from Table 5 are the residuals for which there 
were large negative values for all three years — the Measurement 
& Geometry items Tor District 0, the Word Recognition items for 
District C, The Study Skills and Computation items for District 
D, 'and the Measurement & Geometry items for district E. The 
consistently large negative entries for these areas were in'con- 
trast to the other districts 1 values for those clusters. Thus, 
these results suggest that the skills measured by those clusters 
raay be included in the curriculum of the other districts, but not 
in the cited ones. 

To further examine the results from Table 5, another regres- 
sion analysis was conducted in which the unit of analysis was the 
curricular program within each district rather than the entire 
district. This analysiswas conducted to examine whether there 
were differences in the mean residuals across the four types of 
programs. "Rable 6 presents these results for ,the districts and 
clusters which were noted as anomolous in Table 5. 

As noted in Table 6, the large residuals persisted for Dist- 
rict A»s Measurement & Geometry items for all the curricular pro- 
grams except for the College Preparatory one. Thus, it appears 
that Measurement & Geometry was emphasised more "in the College 
Preparatory curriculum but not in the other three. District D»s 
Study Skills items behaved in the same manner. Those skills rnav 
not have been stressed in the College Preparatory or General 
programs to the same extent'that they were in the other two. 

is 20 



TABLE' 6 



Weighted Standardized Mean Residuals' of Program P-Values 
For Certain Districts find Certain Content Categories 



D i st r i ct 
Year 



Business/ 
Commercial 



Instructional . Program 

Vocational 



Col lege 
Preparatory 



Genera 1 



District fl (Measurement & Geometry) 

1976 „ -1.77 -0.41 

1980 • -0.70 2.86 

1982 -0.87 0.80 



District C (Word Recognition) 

1978 -2. 49 

1980 -1.88 

1982 -1.34 1 

District D (Study Skills) 
1978 0.01 
1980 0.55 
1982 # 0 -0.22 

District D (Computation) 
197ff -0.87 v 

1980 -K48 
1982 -0.52 



-3.81 ' 
-2. 0^ 



-1.30 
-2. 34 
-0. 84 



-1.69 
-3.47 
-1.4(2 



-1.43 
-3.60 
-0. 72 



•1.64 ^ 
-1.71 
-0.68 ■ 



District £ (Measurement & Geometry) 

1978 -2.61 0.42 

1980 , -2.64 -0.26 

1582 -3.39 -1..15 



-0.93 
0.62 \ 
-0.72 



-1.73 
-1.76 

-0.70 



-0.21 
-2. 12 
-0.58 



0.02 

t1. 39 
-1.79 



-1.24 
-1.04 
-0.06 



-1.90 
-1. 18 
-1. 33 



0.55 
-1. 35 
-1'. 33 



-1. 92 
-2.23 
-3. 13 



9 

ERIC 



ft similar conclusion can be drawn for District E' s Measure- 
ment & Geometry skills. The lower mean residual for the Business 
and General programs' compared to the other two programs suggests 
a difference in the emphasis of these skills across programs. 
Finally, for District C s Word Recognition items and District D' s 
Computation "Sterns, there was no discernible difference in the 
mean residuals across the programs, indicating no differences in 
the 'curriculum-to-test match across programs. However, the nega- 
tive resi'duals did indicate a lower than expected performance. 



17 



21 



It would be of interest to examine the relationships over 
time to determine the effect of the testing program's impact on 
changes in specific curricula. One can examine Tables 5 and 6 to 
determine trends, of larger values of the residuals from 1378 to 
13BS. Yet, as previously' stated, there is^a confounding of 
increases in total test score which impacts on the values of the 
residuals. What one is able to conclude is that within each year 
the mean residuals reflects the relationship between that year's 
statewide performance and the expected performance of the units. 
Interpretations* of -compari sons* among 'years may be tenuous. 

Summary — 

This study examined* the curricula validity of the New 
Jersey Minimum Basic Skills(MBS) test, a mini mum- competency test 
which is used t for high school graduation decisions. Based on exa- 
minations of a> Modified Caution Index, there were certain diffei — 
.ences? in the unusual response patterns. Further, the reading 
indices were larger than the mathematics indices, indicating that 
there may have been a greater match between the curriculum and 
the mathematics test^J£han with the reading test. 

There was also no difference in the mean Modified Caution 

Index for mathematics over time which could indicate a possible 

lack of improved consistency between the schools' curriculum and 

the content of the test. The C for reading increased signifi- 

i 

cantly over time indicating perhaps a greater disparity between 
the curriculum and test — certainly an anomolous and unexpected 
result given the importance placed on the MBS test by the public 
reporting of the results. The greater complexity in teaching 



X 



!5 

.2 • 



reading than mathematics as well as the improvement in scores^, 

from 1978- to 1982 are likely to be reasons for these results.! 

I 

For both reading and mathematics, the mean Modified Caution 

js . 

Indices were reasonably large enough to suggest that there wer-e 

very unusual response patterns. Regression analyses were con- \ 

ducted to examine the anomolous situations. The results of these 

analyses showed that there were differences between curricular^ 
j 

programs within a school district, in terms of the unusual res-, 
ponse pattern. This result suggests that witshin districts there 
may be differences in the content coverage and emphasis placed on 
some of the subsets of items contained, on an MCT. 

It is not necessarily, tr^e that unusual response patterns 
are the r^ult of a lack of a rrt^tch between the content covered 
.on a test and the curriculum. As Harnisch and Linn (1981) note 
there may be many explanations for the unusual patterns. Thus, 
one cannot conclude froro-this study that one Minimum Competency 
Test can (or cannot) be curricular valid f<?r students in varying 
curricular programs in different districts. However, because 
tftTferences were noted across districts and' curricular programs^ 
there, is the suggestion that there may be problems using one * 
test. Other, more detailed non-test based analyses should be 
conducted to further examine* he curricular validity and also the 
instructional validity. Such information would be very beneficial 
^ for school districts 4 to have for planning purposes* 

The analyses conducted in this study provide* an initial 
insight into possible differences between test and curricular 
matches. Such analyses are useful .for detecting mismatches so 

ERiC -19 Oo 



that corrective action car. be taken. It students are to be held 
accountable by having their graduation decisions based in part on' 
test results, it is critical that ,the test be content valid in 
its broadest sense. 



V 

r 



V ERIC ' 20 24 



REFERENCES 



Gut t man, 11., The quantification of a class of attributes: ft 
theory and method of scale construction. In P. Horst, P. Wall in, 
& L. Guttman (Eds.), The ecedict ion of Bgrsonal adjustment*. New 
# York: Social Science Research Council, Committee on Social 
Adjustment, 1941. . / • 

,Haney, W. , Validity and Competency Tests:- The Debra P. case, 
conceptions of validity, and strategies for the future. In G-. 
Madaus (E'd. ) , The Court s^ Validity^ and; Minimum Competency 
lestina, Boston:. Kluwer-Ni jhof f Publishing, ™983. 

Harnisch, D. Lf & Linn, R. L. , Analysis of item response patterns: 
Questionable test data and dissimilar curriculum practices., 
Journal of Educational Measurement, 1981, 1,8, 133-146. 

Leinhardt, G. , Overlap: testing whether it is taught. In*G. 
Madaus (Ed.), Jhe Court s A Val i£ity A and .Minimum Competency 
Jestina, Boston: Kl uwer-Ni jhof f Publishing, 19837 

Madaus, G. F.% Minimum competency testix^for certification: The 
evolution and evaluation/ of test validity. IrAs. Madaus (Ed.), 
Jhe Courts^ Vaiidit^ and Minimum Competency Vest ing, Boston: 
-Kluwer-Ni jhof f Publishing, 19837 " 

Ne.w Jersey State Department of Education, New Jersey Minimum 
Basic Skills Testing. Program 19Z7-78i Directory of test T 
seeci fixations and Items, 19787 

New Jersey State Department of Education, New Jersey Minimum 
Basic Skills Testing. Prog.ram JL 1979^80: Direct ory ""of test 
specifications and Items, 19807 \7 ™ 

New Jersey State ^epartrnertt of Education, New Jersey Minimum 
Baaie Skills Testing Prog^am^ 1981-821 ©irectory^of tgst 
SBecif icat ions and It £ras, 1982.. ~ 

New Jersey State Department of Education, New Jersey Minimurfl • 
gas id Skills lest in a Proaram^ 1 977-78 i State regort: Analysis and 
iQterQretat ion of statewide Etrformance, 1978. 

New Jersey State Department of Education, New Jersey Minimum 
gagie Skills lest ina ProaramjL 19Z9 Z 80£ St ate~re£ort i Analysis and 
interpretation of statewide eerform§nce, 19807 

New Jersey State Department of Education, New Jersey j^ini/num 
Basic Skills lest ina Program,. 19ai z 82iSt ate~ri B ort i analysis artd 
iQtececetat i.on of statewide Berform§nci^ 1982. 



21 



25 



Pullin, D. , DebWa P. v. Turlington: Juc^ial standards for * 
assessing the validity of minimum competency tests. In G. Madaus 
(Ed.), The Court s A Validity^ and Mi mm urn Competency Jesting, 
Boston: Kluwei*-Nijhoff Pub 1 ishing7~ 1983. ~ 



Sato,-T., CThe ^construct ion and interpretation of S-P tables]. 
Tokyo: Meiji TGsho, 1975. 

Schmidt W. H. , Porter, A.C., Schwille, J. R. , Floden, R.E. and 
Freeman, D.J. , Validity as a variable: Can the same certification 
test be valid for all students. In G. Madaus. (Ed.), The Courts*. 
ValidttZi. and Minimum Competency lestfng,, Boston: Kluwer-Ni Jhof f 
Publishing, 1.983. 

Tatsuoka, M. M. ■ Recent psychometric developments in Japan: 
EDSi£e*!2* .grapple With educational i5 e a s u r e rn en t "* p r p bT e m s7 Paper 
presented at the ONR Contractors Meeting~on Individualized 
Measurement, Columbia, Mo. ,1978. 




26 

22 



